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ABSTRACT 

This paper reviews evaluation and research activities 
of Project MATH (Mathematics Activities for Teaching the Handicapped) 
during 1972-*1973, and discusses evaluation of curriculum materials 
for educable mentally handicapped (EMM) populations. It briefly 
describes field tests in six cities involving our 100 teachers 
(primary through junior high levels). The field tests, concerning 
number and operation strands, involved program evaluation, collection 
of biodemographic information, teacher tracking of daily instruction, 
domputer processing, and a questionnaire for teachers. The paper also 
describes concurrent curriculum reviev, another reviev by 
mathematicians, and implementation of research studies. The following 
issues are considered: (1) Researchers tend to make evaluation 
designs that overestimate the amount of data necessary for revision, 
(2) Researchers tend to overestimate usefulness of empirical data for 
curriculum evaluation, (3) The "representative" field test has a 
hallowed position it may not deserve, (4) The nature of the educable 
mentally retarded population restricts usefulness of evaluation 
designs that rely on pupil change data, (5) Demonstration of 
effectiveness decreases with increased magnitude of the curriculum 
being developed. It is suggested that many issues thought to require 
large-scale field tests could be determined with a few carefully 
controlled research studies, and that mechanisms and criteria be 
developed to select the best sequence for studies. It is noted that 
Project MATH people have decided to stay with small sample, short 
duration studies, (MC) 
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The Role of Res^aarch and Evaluation in EMH Curriculum Development: 

Project MATH 



Project HATH (Matheraatice Activities for Teaching the Handicapped) has as 
its mission the development and validation of a mathematics curriculum for 
handicapped children • The curriculum model adopted for Project MATH provides 
for the slower rate of cognitive development of many handicapped children, 
provides diagnostic alternatives to instruction, and provides for the 
affective and behavioral growth of the child • The curriculum is considered 
multiple-option in that it seeks to provide teachers an array of instructional 
and content- options necessary for teaching youngsters who have failed in the 
traditional curriculum. 

Evaluation and research are two of the five components of Project MATH (the 
remaining components being development, training, and dissemination) # 
Evaluative activities of Project MATH (to this juncture) have been restricted 
to internal and external interim formative product development (Sanders and 
Cunningham, 1973)* Research activities have been related to the examination 
of issues in assessment of needs and to the performance of handicapped 
children on tasks which are analogies of potential curriculum activities. 

Briefly, the following sections of this paper will relate the management of 
evaluation and research activities of Project MATJI during the 1972-73 academic 
year* Pollowing this section, a discussion of issues relating to these areas 
will be attempted* 

Two strands of directed teaching activities were completed and ready for field 
testing by September, 1972, These strands were numbers and operations (primairy, 
intermediate, and junior levels) and sets and operations (primary level only)* 
Each strand included more activities than could be taught during any seven 
months. For each arithmetic topic, a range of activities were included that 
differed along the dimension of teacher-pupil instl^lctional interaction* I 



The purposea of the evaluation effort were: (l) to determine the percentage 
of teachers who would employ the lesson guides on a regular basis; (2) to 
determine the sequences of activities chosen by the teachers} (3) to determine 
the number of activities taught during the instructional dayj (4) to determine 
the teacher ratings of pupil performance on each lesson; and (5) to determine 
the teacher ratings of the quaJ.ity of each lesson guide* 

Each classroom to which Project MATH materials were distributed furnished a 
list of the children with the following biodemographio information: Age, 
BeX| race, IQ, administrative classification of hand'icap^ and parental 
occupation (this variable was not ufied because of lack of data)* Additionally, 
each clasBroom was described as either self-contained or resource room. 

Teachers tracked their instruction on a daily basis* Each activity taught to 
a child was recorded. All lessons were evaluated along two dimensions: 
teacher judgment of pupil performance on a three point continuum (failur^i 
learningi and mastery) ax\d teacher jud^ent of the quality of the lesson as 
written (good, adequate, or poor)» 

Data sheets were collected on a regular basiSi keypunched, and entered into 
a master computer data file# Additionally, an evaluative questionnaire 
composed of Likert-type items regarding the curriculum was sent to all 
teachers near the end of the project year. 

In addition to the field testing of the curriculum, internal review of the 
curriculum proceeded even after the distribution of the materials* This 
review centered upon the consistency of the lesBons to the development model, 
adequacy of the directions, and clarity of the mathematics content* As a 
supplement to this revie>*procesn, an advisory board consisting of profes* 
sionals in mathematics education and special education were asked to respond 



in writing to the adequacy and clarity of the instructional materials • 



The research program was developed independently from the evaluation 
process. A series of research studies were designed by project personnel 
as a means of providing input into future decisions regarding development 
of materials or instructional tactics* Studies were proposed to the 
project management and reviewed for significance of potential data for 
decision^making, availability of necessary resources, and design clarity. 
An independent data acquisition team wa^ recruited tp(fcpollect data for the 



T)ie proposer of the study assumed responsibility for procurring the 
necessary testing materials , training the data acquisition team in the 
experimental procedures, monitering the experimental procedures, designing 
the data analysis systems, and interpretation and writing the project 
report • Subjects were acquired from school systems local to the project, 
many of which were not involved in field testing of materials. 

The use of a data acquisition team Independent from the regular project staff 
allowed the project to conduct approximately twelve studies without signif- 
icantly diverting human resources from the curriculum development efforts t 
For each week a research study was operative, only one staff member was 
engaged in the raonitering of that study. 



Having completed a full year of managing both a moderate scale field test 
(over 100 teachers in six cities) of instructional materials and a 
continuous research program, it is useful to reflect upon issues relating 
to research and evaluation in curriculum development for handicapped 
populations* The conclusions or implications drawn represent solely the 
opinion of this writer and should not be judged as representing the thinking 
of other project staff members or the funding agency. 



studies. 




(1) Evaluation desigiB developed by reBearcKera tend to over-^eatimate the 
aJT^oxmt of data necessary to the formative process of revision ^ The necessary 
corollary to this first generalization is that necessary formative information 
must then be abstracted from a wealth of data, diverting timo and other 
resources away from the use of the necessary data in the revision process. 

A basis decision must be made as to how much resources should be earmarked 
for collection of ancillary datai not directly applicable to the revision 
processi whose ultimate value will rest as a data base to test research 
fiypotheses regarding the use of the curriculum* 

The temptation is all too real to collect data from an available population 
on an array of instinjictor and learner characteristics. Each pieces of data 
can be justified in terms of legitimate research hypotheses, explicated or 
potential. Howeveri the aggregate effect of this data collection process 
may result in obscuring of the focus of the evaluation effort* One should 
clearly deliniate that aspect of the data pool which will yield direct 
formative payoff and develop procedures that maximize the acquisition and 
anhlysiB of that information* 

(2) We tend t o over-estimate the usefulness of empirical data in the process 
of formative evaluation of curriculum * Research designs rely heavily upon 
empirical sources of data. Evaluation designsi often developed by researchers, 
may tend to underestimate the collection of the non-empirical s teacher comment 
and revision of lessons, expert logical or rational analysis of fflaterialSi 
child reaction to the curriculum* 

It could be that we spend most of our resources on collection of empirical 
dateij but rely moot heavily on the "soft'' data for ultimate revision deoisiondv 
Our taioomfortablenesB with such data sources may reflect our training more than 
an objective analysis of the usefulness of such data sources* 
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(3) We have elevated the "repreBentative" field test to a hallov^ed poeition 
it may not deserve. The literature in curriculum evaluation presumes that 
materials should be field tested as a component of formative evaluation. 
Further, the objective of the Aeld test is to demonstrate or fail to 
demonstrate pupil change in response to components ol* the program. Failure 
to demonstrate change becomes the stimulus to the revision process. 

Field testing is an expensive process. Materials must be printed and dissemin- 
ated, teachers must be given some minimum level of in-service training, site 
visits must be conducted, and empirical determinations of pupil progress made. 
The revision cycle for those materials must be delayed often for the entire 
duration of the field test and for the necessary time for data reduction, 
analysis, and interpretation. 

The fact is that this entire set of procedures comprises an act of faith. 
To my knowledge there is no research that indicates that field testing a 
product to determine formative revision on the basis of pupil change data 
results in a more effective product in a summative sense. Here, one is 
not refering to pilot tryouts of materials for the purpose of collecting 
criticism arid revisions of the materials, but rather the more large scale 
field tests we are all fsuniliar with. 

(4) The nature of the lEKR population restricts the usefulness of evaluation 
designs that rely upon pupil change data . Generally, EMfi children enter 
the special education system after a number of years of unsuccessful adapta** 
tion to instruction in the regular grades. Epidemologioal surveys indicate 
little special education placement prior to the third grade. Thus, from the 
outsetj special education ia forced into a remedial as well as a developmental 
mode in response to the education of the EJJH. H-ither than expecting a learner 
to be intact in regard to necessary prerequisite skills and understandings^, it 



is more likely that many children will have epeoific learning dieabilities* 

Per the average child I one might be able to projeot an expectation of one 
year's growth in grade level achievement for each year the child is exposed 
to a developmental curriculum. If the EMR population was characterized as 
intact and approximetely placed in a developmental curriculumi one might 
projeot growth expectancies in relation to average IQ. That is, if the 
population has a mean IQ of 75f a growth rate 3/4 of a grade equivalent 
could be projected. However , in a population characterized by failure sets, 
inappropriate development of prerequisite skills, and large experience gaps, 
what standard should become the criterion for judgement of adequacy? 

Does requiretnent of a criterion-referenced framework for a nornwreferenced 
framework reeolve this difficulty? Rather than looking toward a grade 
equivalent standard, we shift our emphasis to a series of program objectives. 
The mastery of 'xT number of objectives becomes the program goal. Howeveri 
we are still forced into making some a^priori decision regarding the number 
of objectives mastered that we are willing to accept as a criterion of adequate 
progress. If the system lacks this a^jgriori standard, the process of evalua-^ 
tion becoines totally descriptive without a judgemental component. 

Accordingly I we in fact impose normative expectations upon our criterion- 
referenced measurements. This is where the system causes difficulty for 
judging SiR populations responses to a curriculum. Divergent achievement 
patterns often may be the rule rather than the exception. What judgement 
does one make of the efficacy of a curriculiA where only a miDimxm nxmber 
of objectives are mastered? 

The process of judging the adequacy of an instructional sequence by achieve- 
ment also deserves closer sctnxtiny because of inherent population character- 
istics. In this model I failure is most often presumed to lead to revision^^^^^^^^^^^^^^^^^^^^^^^^ - 
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of the instructional materials* Howeveri unless the instructional program 
has identified all specific prerequisite learning aets and task requirements 
(a dubious assignment in view of our present limited knowledge base), it 
may be that failure is a fxmction of a lack of readiness on the paJt*t of 
the child. 

Instructionally for the single childf we accommodate a wide range of 
individual differences by specifically allowing for alternative instructional 
strategies or alternative objectives. Howeveri evaluation designs have 
traditionally been limited to group data decisions* Achievement of objectives 
is evaluated in terms of group average achievement standards. An objeotive 
which shows a low t?roup average achievement may not need revision of the 
instructional program. The instructional program may have been successful 
for those students who were ready for program objective and will be success-* 
ful for other students an another junoturo in the instructional program. 

(5) The larger the magnitude of the curriculugt being developed i the less 
chance of demonstrating effectiveness through the formative process . The 
truth val'ie of this statemdnt will be modified by the length of time allowed 
for this evaluation. The shorter the duration allowed for the evaluative 
r)rocesS| the less likelihood of demonstrating effectiveness (that is, by the 
usual criteria of empirical data relating to achievement). Additionally, the 
older the age of children for whom the curriculxun is intrcduced, if it is 
for u developmental ly organized subject matter, the less likely effectiveness 
can be demonstrated. 

A developmental ly organized curriculum in mathematics or reading, which is sd 
heavily dependent upon some sequential order of skill mastery and spans a wide 
range of chronological age development! relies upon cross-sect ional evaluation 
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designs. That is , unless one was willing to wait six or more yesurs for 
a longitudinally organized formative evaluationi it nnist be introduced 
with children at different age levels , and with children who were not 
previously exposed to the curriculum. This maocimizes the impact of pre- 
requisite skill lags or omissions and minimizes the probability of 
demonstrating effectiveness. 

The feelings of this writer is that we must critically examine previous 
assumptions regarding what constitutes adequacy of the formati/e evaluation 
process. This writer would argue for more reliance upon review of materials 
and instructional programs by various levels of "expert" opinion ~ subject 
matter specialistsi educational psychologiete and/or special educatorSi 
teacherS| and administrators. Better review procedures and research on the 
process of review should be developed. 

Field testing of the materials could be viewed as a component of this 
review process. Data on teacher ease of implementation, attitudes toward 
the program t changes in the intended instructional act ivitleSi and specific 
pupil performance difficulties encountered should be collected. Once we 
separ?ite the field test from an inherent research model to a more purely 
evaluative model| the requirements for large, "representative" field test 
populations could be reduced. One might then opt for more intensive study 
of smaller number of teachers, who were motivated to fully participating in 
vigorous formative review. 

. in 

Prom its inception, Project MATO has invested heavily in a research component 
separate in its organization from the evaluation process. Most of the 
research studies have used small ^ carefully selected samples of children to 
teat carefully structured curriculum-related hypotheses. A large majority 



of these studies have focused upon verbal problem solving processes. The 
impact of many of these studies is directly reflected in the curriculum, 
specifically the verbal problem solving component. 

Overall I the pay-off from the research investment can be judged as consid- 
erable. However, this pay-off was less than maximal due to a degree of 
independence given to the proposers of the research endeavorS| independence 
from the priority of reiearch needs mandated by the development efforts. It 
probably is a more maximizing arrangement to have research studies organized 
from questions raised from the development process, rather than relying upon 
the probabil ^ty that independently generated research studies will have an 
impact upon the development process. Perhapsi many issues we have attempted 
to resolve through the large-scale field test could be determined through a 
limited number of carefully controlled research efforts. 

A potentially useful ai da of future inquiry is what management mechanisms 
and decision criteria must be developed in order to select the roost heuristic 
sequence of research studies to be completed # Our project has elected to 
stay with small sample, short duration studies. Whether such a research 
program organization is the most effective is open for discussiont 

This paper has attempted to develop several issues that should and must be 
discussed by those interested in the process of curriculum development and 
evaluation for handicapped populations. Some of these issues may even extend 
to curriculum development for /^average" learners I to the extent that individual 
differences may prove the universal phenomenon in eduoationV 
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